Feature Analysis

The standard features: LLR (low level discriptors)
File path: \emotiondetection\features_labels_lld

load data

class Data. please see common.py

get the training data



In [48]:

    
import numpy as np
import os
from sklearn.manifold import TSNE

from common import Data

lld=Data('lld')
lld.load_training_data()
print 'training feature shape: ', lld.feature.shape
print 'training label shape: ', lld.label.shape

#lld.load_test_data()
#print 'test feature shape: ',lld.feature_test.shape
#print 'test label shape: ',lld.label_test.shape









    



lld
training feature shape:  (9959L, 384L)
training label shape:  (9959L, 2L)

a. histogram

plot the histgram of one feature, to see what distribution the feature is.



In [42]:

    
import matplotlib.pyplot as plt
%matplotlib inline  

feature_table=[1,10,100,300]
for ind,fea in enumerate(feature_table):
    f= lld.feature[:,fea]
    
    plt.subplot(2,2,ind+1)
    plt.hist(f)
    #plt.title("Histogram of feature "+str(ind))
plt.axis('tight')

Different features have different ditributions.
Some are subject to Gussain distribution.

b. t-SNE

use TSNE to see the linear separability of the data.



In [43]:

    
model=TSNE(n_components=2,random_state=0) # reduct the dimention to 2 for visualization
np.set_printoptions(suppress=True)
Y=model.fit_transform(lld.feature,lld.label) # the reducted data



In [47]:

    
plt.scatter(Y[:, 0], Y[:, 1],c=lld.label[:,0],cmap=plt.cm.Spectral)
plt.title('training data')
plt.axis('tight')
 
print Y.shape









    



(9959L, 2L)

the linear separability is so terrible : (

c. analyse what classification methods is suit for out data theoretically

Training data:
9959 examples and 384 features.
5 classes
most used classification methods SVM: good for 2 calsses. feature dimenson big ->computing time large. $\surd$